Skip to main content

Data Warehouse

When talking about a data warehouse, data lakehouse or data lake, it mostly refers to having a common place to store, manipulate and apply data.

Why is this important? It provides a common place for all types of data-related work and makes it better for data scientists and data engineers to actually do their jobs. From a personal perspective, my read on data warehouses is that it is a must if you want to apply data science and machine learning (traditional is the most likely). As a data scientist, it makes working and understanding data so much easier. So, if a company wants to get serious when it comes to data, then a data warehouse.

Data Lake

As data storage costs have fallen dramatically, there has been a movement to create data lakes, aka just throwing all types of files and data into a big ass lake and hoping that it might be useful. I understand the sentiment, but I feel like this is some sort of hoarder mentality of thinking it might be useful.

Microsoft Fabric

Microsoft Fabric is like an MS copy of data warehouses such as Snowflake and Databricks; it is more of a comprehensive platform with more to offer. In good MS EEE (Embrace-Extend-Extinguish) spirit, it is incompatible, developer-unfriendly and halfway done. But hey, you get good integration with PowerBI so totally worth it (Sarkasm).

Snowflake

Links

Thoughts